User-friendly biplots in R
Centre for Multi-Dimensional Data Visualisation (MuViSU)
muvisu@sun.ac.za
SASA 2024
Aim: Dimension reduction technique that maximises variation between classes while minimising within class variation.
This is achieved by the following tasks:
The classical variance decomposition \[\mathbf{T}=\mathbf{B}+\mathbf{W},\]
has as an analogy in this setting \[ \mathbf{X'X} = \mathbf{\bar{\mathbf{X}}'C \bar{\mathbf{X}}} + \mathbf{X' [I - G(G'G)^{-1}C(G'G)^{-1}G'] X}. \]
The choice of \(\mathbf{C}\) determines the variant of CVA:
Find a linear mapping
\[\mathbf{Y}=\mathbf{X}\mathbf{M}, \tag{1}\]
such that \[\frac{\mathbf{m}'\mathbf{B}\mathbf{m}}{\mathbf{m}'\mathbf{W}\mathbf{m}} \tag{2}\] is maximised s.t. \(\mathbf{m}'\mathbf{W}\mathbf{m}=1\).
It can be shown that this leads to the following equivalent eigen equations:
\[ \mathbf{W}^{-1}\mathbf{BM} = \mathbf{M \Lambda} \tag{3} \]
\[ \mathbf{BM} = \mathbf{WM \Lambda} \tag{4} \]
\[ (\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}) \mathbf{M} = (\mathbf{W}^{-\frac{1}{2}} \mathbf{M}) \mathbf{\Lambda} \tag{5} \]
with \(\mathbf{M'BM}= \mathbf{\Lambda}\) and \(\mathbf{M'WM}= \mathbf{I}\).
Since the matrix \(\mathbf{W}^{-\frac{1}{2}} \mathbf{B} \mathbf{W}^{-\frac{1}{2}}\) is symmetric and positive semi-definite the eigenvalues in the matrix \(\mathbf{\Lambda}\) are positive and ordered. The rank of \(\mathbf{B} = min(p, G-1)\) so that only the first \(rank(\mathbf{B})\) eigenvalues are non-zero. We form the canonical variates with the transformation
\[ \bar{\mathbf{Y}} = \bar{\mathbf{X}}\mathbf{M}.\tag{5} \]
The first two canonical variates are given by:
\[\mathbf{\bar{Z}}=\mathbf{\bar{Y}}\mathbf{J}_2=\mathbf{\bar{X}}\mathbf{M}\mathbf{J}_2 \tag{6}\] where \(\mathbf{J'}_2=[\mathbf{I}_2 \quad \mathbf{0}]\). We add the individual sample points with the same transformation \[\mathbf{Z}=\mathbf{X}\mathbf{M}\mathbf{J}_2. \tag{7}\]
A new sample point, \(\mathbf{x}^*\), can be added by interpolation \[\mathbf{z}^*=\mathbf{x}^*\mathbf{M}\mathbf{J}_2.\tag{8}\]
CVA function| Argument | Description |
|---|---|
bp |
Object of class biplot. |
classes |
Vector of class membership. User specified, otherwise defaults to vector specified in biplot. |
dim.biplot |
Dimension of the biplot. Only values 1, 2 and 3 are accepted, with default 2. |
e.vects |
Which eigenvectors (principal components) to extract, with default 1:dim.biplot. |
weightedCVA |
“weighted” or “unweightedCent” or “unweightedI”: Controls which type of CVA to perform, with default "weighted" |
show.class.means |
TRUE or FALSE: Controls whether class means are plotted, with default TRUE. |
low.dim |
"sample.opt" or "Bhattacharyya.dist": Controls method of constructing additional dimension(s) if dim.biplot is greater than the number of classes, with default "sample.opt". |
Contains the following information on how well the biplot represents the information of the original and canonical space:
quality: Quality of fit for canonical and original variablesadequacy: Adequacy of original variablesaxis.predictivity: Axis predictivitywithin.class.axis.predictivity: Class predictivitywithin.class.sample.predictivity: Sample predictivityThe summary() function prints to screen the fit.measures stored in the object of class biplot.
# Object of class biplot, based on 50 samples and 8 variables.
# 8 numeric variables.
# 4 classes: Northeast South North Central West
#
# Quality of fit of canonical variables in 2 dimension(s) = 91.9%
# Quality of fit of original variables in 2 dimension(s) = 93.4%
# Adequacy of variables in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad
# 0.453533269 0.105327455 0.107221535 0.002201286 0.208653101 0.687840023
# Frost Area
# 0.452308013 0.118544323
# Axis predictivity in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad Frost
# 0.9873763 0.9848608 0.8757913 0.9050208 0.9955088 0.9970346 0.9558192
# Area
# 0.9344651
# Class predictivity in 2 dimension(s):
# Northeast South North Central West
# 0.8031465 0.9985089 0.6449906 0.9988469
# Within class axis predictivity in 2 dimension(s):
# Population Income Illiteracy Life Exp Murder HS Grad Frost
# 0.02246821 0.10349948 0.27870637 0.21460313 0.29836047 0.87510975 0.22320989
# Area
# 0.13603927
# Within class sample predictivity in 2 dimension(s):
# Alabama Alaska Arizona Arkansas California
# 0.769417280 0.174566384 0.328610375 0.148035077 0.103141908
# Colorado Connecticut Delaware Florida Georgia
# 0.357627854 0.079176621 0.438089663 0.327270922 0.558038750
# Hawaii Idaho Illinois Indiana Iowa
# 0.029173037 0.167543892 0.076948041 0.473148418 0.592667777
# Kansas Kentucky Louisiana Maine Maryland
# 0.774719240 0.439306768 0.190654770 0.086183357 0.284829878
# Massachusetts Michigan Minnesota Mississippi Missouri
# 0.428103056 0.188094295 0.644844800 0.163103449 0.719255739
# Montana Nebraska Nevada New Hampshire New Jersey
# 0.239142302 0.671350698 0.015766988 0.386053551 0.207503850
# New Mexico New York North Carolina North Dakota Ohio
# 0.012872885 0.008101305 0.872322617 0.457852394 0.092634247
# Oklahoma Oregon Pennsylvania Rhode Island South Carolina
# 0.561156131 0.158926944 0.261838286 0.482912999 0.229047767
# South Dakota Tennessee Texas Utah Vermont
# 0.095865021 0.237667483 0.121494852 0.349495632 0.256983459
# Virginia Washington West Virginia Wisconsin Wyoming
# 0.453608981 0.044780371 0.346223950 0.544998639 0.174849092
The rotate() function rotates the samples and axes in the biplot by rotate.degrees degrees.
The reflect() function reflects the samples and axes in the biplot along an axis, x(horisontal reflection), y (vertical reflection) or xy (diagonal reflection).
The argument zoom= is FALSE by default. If zoom=TRUE a new graphical device is launched. The user is prompted to click on the desired upper left hand and lower right hand corners of the zoomed in plot.